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' Abstract 
=^ ' 

This paper studies the "explanation problem" for tree- and linearly-ordered array data, a 
problem motivated by database applications and recently solved for the one-dimensional tree- 
ordered case. In this paper, one is given a matrix A — (aij) whose rows and columns have 
semantics: special subsets of the rows and special subsets of the columns are meaningful, others 
, are not. A submatrix in A is said to be meaningful if and only if it is the cross product of a 

' meaningful row subset and a meaningful column subset, in which case we call it an "allowed 

, rectangle." The goal is to "explain" A as a sparse sum of weighted allowed rectangles. Specif- 

Q ' ically, we wish to find as few weighted allowed rectangles as possible such that, for all i,j, Uij 

equals the sum of the weights of all rectangles which include cell 

In this paper we consider the natural cases in which the matrix dimensions are tree-ordered 
or linearly-ordered. In the tree-ordered case, we are given a rooted tree Ti whose leaves are the 
rows of A and another, T2, whose leaves are the columns. Nodes of the trees correspond in an 
, obvious way to the sets of their leaf descendants. In the linearly-ordered case, a set of rows or 

On ' columns is meaningful if and only if it is contiguous. 

For tree-ordered data, we prove the explanation problem NP-Hard and give a randomized 
2-approximation algorithm for it. For linearly-ordered data, we prove the explanation problem 
NP-Hard and give a 2.56-approximation algorithm. To our knowledge, these are the first results 
for the problem of sparsely and exactly representing matrices by weighted rectangles. 

' 1 Introduction 
H , 

I This paper studies two related problems of "explaining" data parsimoniously. In the first part of 

this paper, we focus on providing a top-down "hierarchical explanation" of "tree-ordered" matrix 
data. We motivate the problem as fohows. Suppose that one is given a matrix A = (aij) of data, 
and that the rows naturally correspond to the leaves of a rooted tree Ti, and the columns, to the 
leaves of a rooted tree T2. For example, Ti and T2 could represent hierarchical IP addresses spaces 
with nodes corresponding to IP prefixes. Each node of either Ti or T2 is then said to correspond to 
the set of rows (or columns, respectively) corresponding to its leaf descendants. Say 128 . * (i.e., the 
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set of all 2^^ IP addresses beginning with "128" , which happens to correspond to the . edu domain) 
is a node in Ti and 209 . 85 . 225 . * (i.e., the set of all 2^ IP addresses beginning with 209 . 85 . 225, 
which is www . google . corn's domain) is a node in T2. Then (128.*, 209.85.225.*) could, say, 
represent the amount of traffic flowing from all hosts in the .edu domain (e.g., 128.8. 127.3) to 
all hosts in the www . google . com domain (e.g., 209.85.225.99). It is easy to relabel the rows 
or columns so that each internal node of Ti or T2 corresponds to a contiguous block of rows or 
columns. 

We need a few definitions. Let us say a rectangle in an mxn matrix ^ is a set Rect{ii,i2, ji, j2) = 
{i : ii < i < 12} X {j ■ ji ^ j ^ i2}j for some 1 < ii < 12 ^ m, 1 < ji < j2 < ti. Certain rectangles 
are allowed; others are not. Let TZ denote the set of allowed rectangles. Say a set of i(;(-R)-weighted 
rectangles R represents A = {aij) if for any cell (i, j), the sum of w{R) over cells that contain (i, j) 
is aij. 

Returning to the Internet example, a pair (u, f ), u a node of Ti, w a node of T2, corresponds to 
a rectangle. Say that a rectangle is allowed, relative to Ti and if it is the cross product of the 
set of rows corresponding to some node u in Ti and the set of columns corresponding to some node 
V in T2. In this scenario, we attempt to "explain" or "describe" the matrix by writing it as a sum 
of weighted allowed rectangles. Formally, we wish to assign a weight wr to each allowed rectangle 
R such that the set of weighted rectangles represents A. 

Of course there is always a solution: one can just assign weights to the 1x1 rectangles. But 
this is a trivial description of the matrix. Usually more concise explanations are preferable. For 
this reason we seek an "explanation" with as few nonzero terms as possible. More precisely, we 
seek to assign a weight wr to each allowed rectangle R such that the set of weighted rectangles 
represents A, and such that the number of nonzero weights wr assigned is minimized. (We define 
problems formally in Section [H) 

Here is a 1-dimensional example. Suppose that a media retailer sells items in exactly four 
categories: action-movie DVD's, comedy DVD's, books, and CD's. The retailer builds a hierarchy 
with four leaves, one for each of the categories of items. A node "DVD's" is the parent of leaves 
"action-movie DVD's" and "comedy DVD's". There is one more node, a root labeled "all", with 
children "DVD's", "books", and "CD's". 

Suppose that one year, sales of action-movie DVD's increased by $6000 and sales of the other 
three categories increased by $8000 each. One could represent the sales data by giving those four 
numbers, one for each leaf of the hierarchy, yet one could more parsimoniously say that there was a 
general increase of $8000 for all (leaf) categories, in addition to which there was a decrease of $2000 
for action-movie DVD's. This is represented by assigning $8000 to node "all" and $-2000 to "action- 
movie DVD's". While many different linear combinations may be possible, simple explanations tend 
to be most informative. Therefore, we seek an answer minimizing the explanation size (the number 
of nonzero terms required in the explanation). 

Here is a definition of TreexTree. An instance consists of an m x n matrix A = {aij), along 
with two rooted trees, a tree Ti whose leaf set is the set of rows of the matrix, and a tree T2 whose 
leaf set is the set of columns. Let Li{v) be the leaf descendants of node v in tree Tj, i G {1,2}. 
Now IZ is just the set {Li{u) x L2{v) : u is a node in Ti and u is a node in T2}. The goal is to 
find the smallest set of weighted rectangles which represents A. We prove this problem NP-hard 
and give a randomized 2-approximation algorithm for it. APX-hardness is not known. 

The second problem, AllRects, is motivated by the need to concisely describe or explain 
linearly-ordered data. Imagine that one has two ordered parameters, such as horizontal and vertical 



2 



location, or age and salary. No trees are involved now. Instead we allow any interval of rows (i.e., 
{i ii < i < 12} for any 1 < ii < ^2 ^ m) and any interval of columns (i.e., {j : ji < j < J2} 
for any 1 < ji < j2 < n). For example, [800,1000] x [500,1500] could be used to represent a 
geographical region extending eastward from 800 to 1000 miles and northward from 500 to 1500 
miles, and [35.0, 45.0] x [80000, 95000] could be used to represent the subset of people 35-44 years 
old and earning a salary of $80000-$95000. Then we can use the former "rectangles" to summarize 
the change (say, in population counts) with respect to location, or use the latter with respect to 
demographic attributes age and salary. 

Hence in AllRects the set TZ of allowed rectangles is the cross product between the set of row 
intervals and the set of column intervals. As a linear combination of how few arbitrary rectangles 
can we write the given matrix? We prove this problem NP-hard and give a 2. 56- approximation 
algorithm for it. Again, APX-hardness is unknown. 



To our knowledge, while numerous papers have studied similar problems, none proposes any al- 
gorithm for either of the two problems we study. One very relevant prior piece of work is a 
polynomial-time exact algorithm solving the 1-dimensional version of Tree x Tree (more properly 
called the "tree" case in 1-d, since only one tree is involved) p]. Here, as in the media-retailer 
example above, we have a sequence of integers and a tree whose leaves are the elements of the 
sequence. Indeed, we use this algorithm heavily in constructing our randomized constant-factor 
approximation algorithm for the tree x tree case. 

Relevant to our work is [1] by Bansal, Coppersmith, and Schieber, which (in our language) 
studies the 1-d (exact) problem in which all intervals are allowed and all must have nonnegative 
weights, proves the problem NP-hard, and gives a constant-factor approximation algorithm. 

Also very relevant is a paper by Natarajan [13], which studies an "inexact" version of the 
problem: instead of finding weighted rectangles whose sum of weights is a^j exactly, for each ma- 
trix cell (i, j), these sums approximate the ajj's. (Natarajan's algorithm is more general and can 
handle any arbitrary set TZ of allowed rectangles; however, the algorithm is very slow.) More pre- 
cisely, in the output set of rectangles, define a'^j to be the sum of the weights of the rectangles 
containing cell Natarajan's algorithm ensures, given a tolerance A > 0, that the L2 error 



Y Z]2=i X]j=i(^ij ~ '^ij)'^ is at most A. (Natarajan's algorithm cannot be used for A = 0.) The 
upper bound on the number of rectangles produced by Natarajan's algorithm is a factor of approx- 
imately 181n(||j4||2/A) (where ||^||2 is the square root of the sum of squares of the entries of A) 
larger than the optimal number used by an adversary who is allowed, instead, only L2-error A/2. 
Furthermore, Natarajan's algorithm is very slow, much slower than our algorithms. 

Frieze and Kannan in [9] show how to inexactly represent a matrix as a sum of a small number 
of rank-1 matrices, but their method is unsuitable to solve our problem, as not only is there no way 
to restrict the rank-1 matrices to be rectangles, the error is of Li type rather than Loo. In other 
words, the sum of the mn errors is bounded by Amn, rather than individual errors' being bounded 



Our problem may remind readers of compressed sensing, the decoding aspect of which requires 
one to seek a solution x with fewest nonzeroes to a linear system Hx = b. The key insight of 
compressed sensing is that when H satisfies the "restricted isometry property" [Ml El [8], as do 
almost all random matrices, the solution x of minimum Li norm is also the sparsest solution. The 
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problem with applying compressed sensing to the problems mentioned herein, when the matrix A 
is m X n, is that the associated matrix H, which has mn rows and a number of columns equal to 
the number of allowed rectangles, is anything but random. On a small set of test instances, the 
authors found the solutions of minimum Li norm (using linear programming) and discovered that 
they were far from sparsest. 

Other authors have studied other ways of representing matrices. Applegate et al. [2] studied the 
problem of representing a binary matrix, starting from an all-zero matrix, by an ordered sequence 
of rectangles, each of whose entries is all or all 1, in which Ojj should equal the entry of the last 
rectangle which contains cell Anil Kumar and Ramesh [3] study the same model in which only 

all-1 rectangles are allowed (in which case the order clearly doesn't matter). Two papers [14^ 111] 
study the Gale-Berlekamp switching game and can be thought of as a variant of our problem over 

Z2. 

3 A Few Words About Practicality 

Admittedly, for noisy data in the real world, probably more practical problems than our "exact" 
problems are these two bounded-error (i.e., -Loo) "inexact" problems: Given an input of either 
Tree X Tree or AllRects and a number A > 0, find a smallest subset of allowed rectangles, 
and weights for each, such that for any cell (i, j), aij differs from the sum of the weights of the 
rectangles containing by at most A in absolute value, problems and so we leave them for 
future work. Nonetheless, we find the exact problems interesting and the solutions nontrivial, and 
hope that studying them may yield insight for solving the A > case. 

4 Formal Definitions and Examples 

Given an m x n matrix A = (aij) and 1 < ii < ^2 ^ 1 ^ Ji ^ J2 ^ n, recall that 
Rect{ii,i2ji,j2) = {{h3)\k < i < kjji < j < ^2}- Define Rects = {i?ect(ii, ^2, ji, j2)|l < 
^1 < *2 < 'Ti, 1 < ji < j2 < n}. For each of the two problems, we are given a subset TZ C Rects; 
the only difference between the two problems we discuss is the definition of TZ. The goal is to find 
a smallest subset 0PT2{A) of TZ, and an associated weight w{R) (positive or negative) for each 
rectangle R, such that every cell is covered by rectangles whose weights sum to aij, that is, 

aij = ^ w{R), (1) 

R:R£OPT2{A) and 

the "2" in "0Pr2(A)" referring to the fact that A is 2-dimensional. 

While the algorithm for the treextree case appears (in Section [5]) before that for the arbitrary- 
rectangles case (in Section [6]), here we define AllRects, the latter, first, since it's easier to define. 
As mentioned above, we call the case of 7^ = Rects AllRects. 
Example. Since the matrix 
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A can be written as a linear combination with w{{l, 2, 3, 4} x {1, 2, 3, 4}) = 2, w{{2, 3, 4} x {1, 2}) = 
3, w{{3} X {1,2,3,4}) = 1, 'u;({2,3} x {2,3}) = -2, and w{{2} x {3}) = 1. Hence \0PT2{A)\ < 5. 
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We need some notation in order to define Tree x Tree, in which we are also given trees Ti 
and T2. We use Ri to denote the row vector in the ith row of the input matrix, 1 < i < m. For 
a node u G Ti, let = {Ri : / is a leaf descendant in Ti of u}. Similarly, we use Cj to denote 
the column vector in the jth column of the input matrix, 1 < j < n. For a node v G T2, let 
5^ = {Ci : / is a leaf descendant in T2 of v}. Note that, since Ti and T2 are trees, {S'^jn G Ti} 
and {5^1^ G T2} are laminar. 

In this notation, in TreexTree, TZ = {Sl\u G Ti} x {S^\v G T2}. 
Example. Using trees Ti, T2 having a root with four children (and no other nodes) apiece, we 
may use any single row or all rows, and any single column or all columns. For example, since the 
matrix 
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we can write A as a sum with u;({l,2,3,4} x {1,2,3,4}) = 3, ^({1} x {1,2,3,4}) = 2, w{{3} x 
{1,2,3,4}) = -l,u;({l,2,3,4}x{3}) = -l,u;({l}x{2}) = -2,w;({2}x{2}) =-3,t/;({2}x{4}) = 
1, and w{{3} x {4}) = 1. Since there are eight matrices, \0PT2{A)\ < 8. 

Note that we use the same notation, 0PT2{A), for the optimal solutions of both AllRects 
and TreexTree. 

5 Approximation Algorithm for TreexTree 

We defer the proof of NP-Hardness of Tree x Tree to the appendix. 

Our algorithm will rely upon the exact algorithm, due to Agarwal et al. for the case in 
which the matrix has just one column (that is, the 1-dimensional case). 

Definition 1. Given a fixed rooted tree Ti with m leaves, and an m- vector V = (vi), let OPTi(V) 
denote a smallest set of intervals I = {i : ii < i < 12} C [l,m] and associated weights w{I), each I 
corresponding to a node of Ti, such that for all i, vi = 'Y1ii-i^opTx{v) and i^i ^(-^)- 

Clearly |OPTi(y)| equals \OPT2(y')\, where V is the m x 1 matrix containing y as a column. 
The difference is that OPTi[V) is a set of vectors while OPT2iy') is a set of rectangles. We 
emphasize that F is a vector and that the definition depends on Ti and not T2 by putting the "1" 
in "OPTi(y)". The key point is that [1] showed how to compute OPTiiy) exactly. 

In order to charge the algorithm's cost against 0PT2{A), we need to know some facts about 
OPT2{A). Recall that OPT2{A) is a smallest subset of IZ such that there are weights w{R) such 
that equation ([T]) holds. 

Definition 2. 

1. For each rectangle R and associated weight wr, let R'w^ denote the m x n matrix which is 
for every cell except that Rw^j '■= if G R. 

2. Given a vertex v of T2, let D^j be the set of all R G OPT2{A) such that R has column set 
exactly equal to 5^. 
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3. Now let = J2ReDv ^wr- definition of D^, all columns j of for j £ are the same. 
Let Vy be column j of for any j G Dy. 

Lemma 3. The column vectors (Vy) satisfy the following: 

1. For all leaves I in T2, the vector Ci equals the sum ofVy over all ancestors v of I m T2. 

2. For all leaves I' and I" in T2 with a common ancestor u, the vector Cii — Cin equals the sum 
of Vy over all vertices v on the path from u down to I' (not including v = u) minus the sum 
of Vy over all vertices v on the path from u down to I" (not including v = u). 

3. The union, over all vertices v G T2, of OPTi{Vy)x{Sy} (which obviously has size \OPTi{Vy)\), 
with the corresponding weights, is an optimal solution for Tree x Tree on A. 

4. \OPT2{A)\ = j:yeTjOPniVy)\. 

Proof. The nodes v which correspond to sets of columns containing column Ci are exactly the 
ancestors in T2 of /. Hence, Part [T] follows. 
Part [2] is an immediate corollary of Part [TJ 

Clearly, by Part[Tl the union over all vertices f E T2 of OPTi{Vy) x {S^} is a feasible solution 
for Tree X Tree on A. It is also optimal, and here is a proof. The size of the optimal solution 
OPT2{A) equals the sum, over vertices v € T2, of the number of rectangles in OPT2{A) having 
column set Sy. Fix a vertex u G T2. Since the weighted sum of the rectangles in OPT2{A) with 
column set 5^ is Vy, and each has a row set 5^ for some u G Ti, the number of such rectangles must 
be at least OPTi{Vy). If the number of rectangles with column set Sy strictly exceeded OPTi{Vy), 
we could replace all rectangles in OPT2{A) having column set Sy by a smaller set of weighted 
rectangles having column set Sy, each of whose columns is the same, and summing to Vy in each 
column; since the new set and the old set have the same weighted sum, the new solution would still 
sum to A, and have better-than-optimal size, thereby contradicting optimality of OPT2{A). Part 
[3] follows. 

Part m follows from Part [3l □ 
Lemma [3] will be instrumental in analyzing the algorithm. 

While the algorithm is very simple to state, it was nontrivial to develop and analyze. In the 
algorithm, we use the algorithm by Agarwal et al. |lj to obtain OPTi{V) given a vector V. 



Algorithm for Tree x Tree 

1. For every internal node u in the tree T2, pick a random child u* of u and let c{u) = u*. Let 
path{u) be the random path going from u to a leaf: 

u I—)- c{u) I— )• c(c(n)) I— )• • • • I— 7- l{u), 

where we denote the last node on the path, the leaf, by /(n). 

2. Where root denotes the root of T2, for every node u in T2, in increasing order by depth, do: 

• If u is the root of T2, then 
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Output OPTi (C;(j,oot) ) X {-S^oot } with the corresponding weights (those of the optimal 
solution for Cn^root))- 



• Else 



Let p{u) be the parent of u. 

Output OPTi{Ci(^y^ — C;{p(u))) X {S"^} with the corresponding weights. 



Theorem 4. The expected cost of the algorithm is at most 2\OPT2{A)\. 

In the main part of the paper we prove a weaker guarantee for exposition: the expected cost of 
the algorithm is at most 4:\OPT2{A)\. We defer the improvement to the appendix. 
The algorithm can be easily derandomized using dynamic programming. 

Proof. Every column is covered by rectangles with sum 



Thus the algorithm produces a valid solution. We now must estimate the expected cost of the 
solution. The total cost incurred by the algorithm is 



Assume, without loss of generality, that all nodes in the tree either have two or more children or 
are leaves. Denote the number of children of a node v, the degree of v, by d{v). Denote by 1 the 
indicator function. Observe that for the root node we have 



for a nonroot vertex u, we have by Lemma [3] ([2]), keeping in mind that /(•), c(-), and path{-) are 
random, 



{Cu — C';(p(m))) + (C'/(p(«)) — Cl{p{p{u)))) H + Ci(^root) — Cu- 



\OPTi{Ci(^,,,t^)\ + \OPTi{Ci^u) - 





Here we used the triangle inequality for the function \OPTi{-)\. 
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Consider the second sum in the right-hand side. For every child u' of the random node 
c{p{u)) takes value u' with probability l/d{p{u)). Thus 



E 
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Pr (^u / c(j){u))^ equals {d(j){u)) — l)/d{p{u)). Denote this expression by a^. The total expected 
size of the solution returned by the algorithm is bounded by 
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Notice that, for a fixed u' ^ root, 
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Hence, the total cost of the solution is bounded by 



^e[ \OPTr{V,)\] + E ^[ E \OPTi{V.) 

" v&path{u) u'^root v&path{u') 



<25^e[ Y \OPTi{V,) 

vEpath{u) 



Finally, observe that node v belongs to path{v) with probability 1; it belongs to the path{p{v)) 
with probability at most 1/2; it belongs to the path path{p{p{v))) with probability at most 1/4, 
etc. It belongs to path{u) with probability if « is not an ancestor of v. Thus 

^Y^[ E \OPTi{Vy)\\ = 2^|OPTi(K)|- (^Pr(i;Gi?at/i(i 

u vepath{u) V u 

< 2^|OPTi(K)| • (l + l/2 + l/4 + ---) 

V 

< 4Y \OPT^iVy)\ < 4|OPr2(^)|. 



We have proven that the algorithm finds a 4-approximation. A slightly more careful analysis, in 
the appendix, shows that the approximation ratio of the algorithm is at most 2. □ 



8 



What is the running time of the 2-approximation algorithm? The time needed to run the 1- 
dimensional algorithm of [I] is 0{dn) where there are n leaves in each tree and the smaller of the 
two depths is d. One can verify that the running time of our 2-approximation algorithm is a factor 
0{n) larger, or 0{dn?). In most applications at least one of the trees would have depth O(logn), 
giving 0(n^ log n) in total. 

6 Approximation Algorithm For AllRects 
6.1 The 1-Dimensional Problem 

First we consider the one-dimensional case, for which we will give a (23/18 + e)-approximation 
algorithm; 23/18 < 1.278. We are given a sequence ai,a2, . . . , a„ of numbers and we need to find 
a collection of closed intervals with arbitrary real weights Wij so that every integral point 
k G {1, . . . , n} is covered by a set of intervals with total weight a^. That is, for all /c, 

^ Wij = Qk- (5) 

Our goal is to find the smallest possible collection. We shall use the approach of Bansal, Copper- 
smith, and Schieber p[] (in their problem all Oj > and all Wij > 0). Set ao = and a^+i = 0. 
Observe that if = a^+i, then in the optimal solution every interval covering k also covers k + 1. 
On the other hand, since every rectangle covering both k and k + 1 contributes the same weight to 
afc and a^+i, if ^ a^+i, then there should be at least one interval that either covers k but not 
A; + 1, or covers k + 1 but not k. By the same reason, the difference ak+i — flfc, which we denote 
by Afc = afc_|_i — a^, equals the difference between the weight of intervals with the left end-point at 
k + 1 and the weight of rectangles with the right endpoint at k: 

Afe = ^ Wk+i,j - ^ Wife. (6) 

j.k+l<j i:i<k 

Note that if we find a collection of rectangles with weights satisfying then this collection of 
intervals is a valid solution to our problem, i.e., then equality ([5]) holds. Define a directed graph 
on vertices {0, . . . ,n}. For every interval we add an arc going from i — 1 to j. Then the 

condition ([6]) can be restated as follows: The sum of weights of arcs outgoing from k minus the sum 
of weights of arcs entering k equals A^,. Our goal is to find the smallest set of arcs with non-zero 
weights satisfying this property. Consider an arbitrary solution and one of the weakly connected 
components S. The sum X]fcg5 ^fc ~ 0' since every arc is counted twice in the sum, once with 
the plus sign and once with the minus sign. Since is a connected component the number of arcs 
connecting nodes in S is at least |5| — 1. Thus a lower bound on the number of arcs or intervals in 
the optimal solution is the minimum of 

M 

^{\St\-l) = n + l-M 
t=i 

among all partitions of the set of items {0, ...,n} into M disjoint sets Si,...,Sm such that 
J2keSt ~ ^ ^'-'^ ^' ^^^^ other hand, given such a partition {Si, . . . ,Sm), we can easily 
construct a set of intervals. Let kt be the minimal element in St- For every element k in St \ {kt}, 
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we add an interval [kt + l,k] with weight — A^. We now verify that these intervals satisfy ([6]). If 
k belongs to St and k ^ kt, then there is only one interval in the solution with right endpoint at 
k. This interval is [kt + l,k] and its weight is — A^. The solution does not contain intervals with 
left endpoint at A; + 1 (since k ^ kt). Thus ([6]) holds as well. If k belongs to St and k = kt, 
the solution does not contain intervals with the right endpoint at k, but for all k' G St there is an 
interval [A; + 1, k'] with weight — A^/. The total weight of these intervals equals 

-Afc' = -Yl + Afc = Afc. 

Condition dH again holds. 

Thus the problem is equivalent to the problem of partitioning the set of items {0, . . . , n} into 
a family of M sets {Si, . . . , Sm} satisfying the condition YlkeSt ~ ^ ^' minimize 

^jdS'il — 1) = (n + 1) — M. Notice that the sum of all A^ equals 0. Moreover, every set with the 
sum of Afc equal to corresponds to an instance of the 1-dimensional rectangle covering problem. 
We shah refer to the problem as Zero- Weight Partition. 

We now describe the approximation algorithm for Zero- Weight Partition which is a modi- 
fication of the algorithm of Bansal, Coppersmith, and Schieber [4] designed for a slightly different 
problem (that of minimizing setup times in radiation therapy) . 

Remark 5. For Zero- Weight Partition, our algorithm gives a slightly better approximation 
guarantee than that of l^: 23/18 ~ 1.278 vs 9/7 ~ 1.286. The difference between algorithms is 
that the algorithm of Bansal, Coppersmith, and Schieber [4] performs either the first and third 
steps (in terms of our algorithm; see below), or the second and third steps; while our algorithm 
always performs all three steps. 

In the first step the algorithm picks all singleton sets {k} with A^ = and pairs with 
Aj = — Aj. It removes the items covered by any of the chosen sets. At the second step, with 
probability 2/3 the algorithm enumerates all triples k} with Aj + Aj + A^ = and finds the 
largest 3-set packing among them using the (3/2-|-e)-approximation algorithm due to Hurkens and 
Schrijver p^, i.e., it finds the largest (up to a factor of (3/2 + e)) disjoint family of triples k} 
with Aj-|- Aj -|- Afc = 0. Otherwise (with probability 1/3), the algorithm enumerates all quadruples 
k, 1} having Aj + Aj + A^ + A; = and finds the largest 4-set packing among them using the 
(2 + e)-approximation algorithm due to Hurkens and Schrijver |10) . At the third, final, step the 
algorithm covers all remaining items, whose sum of A^'s is zero, with one set. 

Before we start analyzing the algorithm, let us consider a simple example. Suppose that 

(ai, 02, 02, 04, 05, oe) = (15, 8, 10, 17, 18, 15). 

First we surround the vector with two O's: 

(ao, ai, 02, 02, 04, 05, 06, 07) = (0, 15, 8, 10, 17, 18, 15, 0). 

Then compute the vector of A^'s: 

(Ao, Ai, A2, A2, A4, A5, Ag) = (15 - 0, 8 - 15, 10 - 8, 17 - 10, 18 - 17, 15 - 18, - 15) 

= (15,-7,2,7,1,-3,-15). 

Notice that (-15) + 7+ (-2) + (-7) + (-1) + 3 + 15 = 0. We partition the set into sets of weight 0: 

{Ao, Ag}, {Ai, A3}, {A2, A4, A5}. 
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This partition corresponds to the following solution of the 1-dimensional problem: interval [1,6] 
with weight 15, interval [2,3] with weight —7, interval [3,4] with weight —1, interval [3,5] with 
weight 3. 

Lemma 6. For every positive e > 0, the approximation ratio of the algorithm when using e is at 
most 23/18 + 0{e), with 23/18 < 1.278. 

Proof. First, observe that the partitioning returned by the algorithm is a valid partitioning, i.e., 
every item belongs to exactly one set and the sum of A^'s in every set equals 0. We show that 
the first step of the algorithm is optimal. That is, there exists an optimal solution that contains 
exactly the same set of singletons and pairs as in the partition returned by the algorithm. Suppose 
that the optimal solution breaks one pair {i,j} (Aj = — Aj) and puts i m. S and j in T. Then we 
can replace sets S and T with two new sets {i, j} and 5UT \ {i, j}. The new solution has the same 
cost as before; the sum of A^'s in every set is 0, but the pair {i, j} belongs to the partitioning. 
Repeating this procedure several times, we can transform an arbitrary optimal solution into an 
optimal solution that contains the same set of singletons and pairs as the solution obtained by the 
approximation algorithm. 

For the sake of the presentation let us assume that e = (that is, we assume that the approxi- 
mation algorithms due to Hurkens and Schrijver [10], we use in our algorithm, have approximation 
guarantees at most 3/2 and 2). Let pk be the number of sets of size k in the optimal solution. The 
cost of the optimal solution is P2 + 2p3+3p4+4p5 + - • • , because the objective function charges jS*! — 1 
to a set of size | S| . Our approximation algorithm also finds pi singleton sets and p2 pairs. Then with 
probability 2/3, it finds S3 > (2/3)p3 triples and covers the remaining 3 • (ps — S3) + 4p4 + Sps + • • • 
vertices with one set; and with probability 1/3, it finds S4 > P4/2 quadruples and covers the re- 
maining 3p3 + 4 • (p4 — S4) + 4p4 + 5p5 + • • • vertices with one set. Thus the expected cost of the 
solution returned by the algorithm equals 



It is easy to verify that if e > 0, the approximation ratio of the algorithm is at most 23/18+0(e). □ 

We now prove that finding the exact solution of the problem is NP-hard. 
Lemma 7. The zero-weight partition problem is NP-hard. 

Proof. We construct a reduction from the classical NP-complete 3- Partition to the zero- weight 
partition problem. Recall that in 3-Partition we are given 3m numbers 61, . . . ,63m strictly be- 
tween B/A and B /2 and we need to check if the set can be partitioned into m sets such that the 
sum of all elements in each set equals B (and hence each set must have size 3). Such a partition 




Therefore, the approximation ratio of the algorithm, assuming that e = 0, is 




1 f f 5 6 
1' 2 ' 3 ' 4' 5 
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is a "3 partition." Given an instance of 3-Partition, we create 3m vertices each having weight 
= bk- Then we create m vertices each with weight = —B. It is easy to see that every set 
of weight zero must have at least four elements; moreover if the set contains exactly four elements 
then one of the elements equals —B and the other three sum up to S. Thus a 3 partition exists 
in the original problem if an only if the vertices in the new problem can be partitioned into m 
zero- weight sets, i.e., the value of the new problem is 4m — m = 3m. □ 

Corollary 8. One- dimensional AllRects is NP-hard. 
6.2 The 2-Dimensional Case 

We now consider the 2-dimensional case. We are given an m x n matrix A = {(lij) (1 < « < m, 1 < 
j < n) and we need to cover it with the minimum number of weighted rectangles Rect(ii,i2,ji,j2) 
(for arbitrary ii,i2, ji, j2)', we use w{ii,i2, ji, j2) for the weight of Rect{ii,i2, ji, j2)- We assume 
that aij = for i and j outside the rectangle {1, . . . , m} x {1, . . . , n}. 

By analogy to the 1-dimensional case, define A^j = aij — ciij+i + aj+i^+i — fflj+ij. Call a pair 
(i, j) with 0<i<m, 0<j<n, with 7^ an array corner. Imagine that the matrix is written 
in an m X n table, and Ajj's are written at the grid nodes. The key point is that every rectangle 
covers exactly one, two, or four of the cells (i + 1, j + 1), {i,j), + 1), + Ij j) bordering a grid 
point, and that those covering two or four of those cells cannot affect A^j. This means that only 
rectangles having a corner at the intersection of the ith and j th grid line contribute to Aij . (This 
is why the definition of A^j was "by analogy" to the 1-d case.) In other words, 

Aij = ^ w{i + l,i2,j + I,j2) + ^ w{ii,i,ji,j) (8) 

i2>i+i and i2>i+i ii<i and ji<j2 

^ w{i + l,i2,ji,j) - ^ w{ii,i,j + I,j2)- 

i2>i+i and ji<j ii<i and j2>j+i 

This means that the number of rectangles in the optimal solution must be at least one quarter of 
the number of array corners, the "one-quarter" arising from the fact that each rectangle has exactly 
four corners and can hence be responsible for at most four of the array corners. 

It is easy now to give a 4- approximation algorithm, which wc sketch without proof, based on 
this observation. Build a matrix M, initially all zero, which will eventually equal the input matrix 
A. Until no more array corners exist in ^ — M, find an array corner with i < m and j < n. 
(As long as array corners exist, there must be one with i < m and j < n.) Let A 7^ be A^j. Add 
to M a rectangle of weight A with upper left corner at (i,i) and extending as far as possible to 
the right and downward, eliminating the array corner at (i, j) in ^ — M. 

It is easy to see that (1) when the algorithm terminates, M = A, and that (2) the number of 
rectangles used is at most the number of array corners in A, and hence at most A\OPT2{A)\. 

Now we give, instead, a more sophisticated, 23/9 + e < 2.56-approximation algorithm for the 
2D problem. The idea is to make more efficient use of the rectangles. Instead of using only one 
corner of each (in contrast to the adversary, who might use all four), now we will use two. In 
fact, we will deal separately with different horizontal (between-consecutive-row) grid lines, using a 
good 1-dimensional approximation algorithm to decide how to eliminate the array corners on that 
grid line. Every time the 1-d algorithm tells us to use an interval [ii,j2]! we will instead inject a 
rectangle which starts in column ji and ends in column j2, and extends all the way to the bottom. 
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Because we use 2 of each rectangle's 4 corners, wc pay a price of a factor of 4/2 over the 1-d 
approximation ratio of 23/18 + 0(e). Hence we will get 23/9 + 0(e). 

Here are the details. Fix i and consider the restriction of the zero-weight partition problem to 
the ith horizontal grid line, i.e., the 1-dimensional zero- weight partition problem with = Ajj. 
Denote by OPT^ the cost of the optimal solution. The number of rectangles touching the ith 
horizontal grid line from above or below is at least OPT^, since only these rectangles contribute 
Ajj's. Every rectangle touches only two horizontal grid lines, thus the total number of rectangles 
is at least J2iLi OPT^/2. 

All rectangles generated by our algorithm will touch the bottom line of the table; that is why we 
lose a factor of 2. Note that if we could solve the 1-dimcnsional problem exactly we would be able 
to find a covering with X^I^i OPT^ rectangles and thus get a 2 approximation. For each horizontal 
grid line z, the algorithm solves the 1-dimensional problem (with A^ = Ajj) and finds a set of 
intervals [ji , with weights Wj^j.^ . These intervals are the top sides of the rectangles generated by 
the algorithm. All bottom sides of the rectangles lie on the bottom grid line of the table. That is, 
for every interval [ji,i2] the algorithm adds the rectangle Rect(i,m, ji, j2) to the solution and sets 
its weight w(i,m, ji, j2) to be wj^j^. 

The total number of rectangles in the solution output by the algorithm is Yl"Li ^LGi, where 
ALGi is the cost of the solution of the 1-dimensional problem. Thus the cost of the solution is at 
most 2 • (23/18 + 0(e)) times the cost of the optimum solution. We now need to verify that the set 
of rectangles output by the algorithm is indeed is a solution. 

Subtract the weight of each rectangle from all Oy 's covered by the rectangle. We need to prove 
that the residual matrix 



O'ij — o-ij 



^ w(ii,m,ji,j2) 

h ,ji ,32 •■ {i,j)eRect{ii ,m,ji ,j2) 



equals zero. Observe that A^^ = a[j^-^ + a[j — a^+i j — j+i = for all < z < m — 1 (i.e., 
all rows i, possibly, except for the bottom line) and < j < n. Assume that not all a[j equal 
to 0. Let a^ijjij be the first nonzero a^^ with respect to the lexicographical order on (i,j). Then 

«io-i,io-i = <-i,io = <,3o-i = 0- Thus a'^^^^ = 0. 
We have proven the following theorem. 

Theorem 9. For every positive e, there exists a polynomial-time approximation algorithm for 
AllRects with approximation guarantee at most 23/9 -|- 0(e), with 23/9 = 2.5555 



6.3 A Simplified Algorithm 

Because of the dependence on e, the running time of the previous algorithm can be large when e 
is small. A simpler algorithm for the 1-dimensional case — namely, just use pairs and triples — can 
be shown to give ratio 4/3 for the 1-d case, and hence 8/3 = 2.6666... in 2-d, only slightly worse 
than 23/9. For the simplified 1-d algorithm, the running time is 0(n + fc^logfc), if there are k 
A's. To run the 2-d algorithm, the running time becomes 0{n^ + 'Yl"i=i f^'f log^t)) where there are 
ki corners on the ith row. Since the number of corners is Q(OPT), the running time is at most 

O(n^) plus 0(maxfc^_|_fc2_| \-k„=OPTj2i^i^^Ski)- Since f(x) = x^logx is convex, this quantity is 

maximized by making as many ki's equal to n as possible. A simple proof then shows that the time 
is 0{n^ + OPT ■ (nlogn)). 
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A Proof of Theorem |4] 



In the main part of the paper we proved that the expected cost of the solution returned by the 
algorithm is at most 4\OPT2{A)\. We now improve this bound to 2\OPT2{A)\. 

Proof. We have shown (see bounds ([3]) and ([!])) that the expected cost of the solution is bounded 
by 
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Fix a node v ^ root. Let p^{v) = v; let p^{v) = p{v) be the parent of v; let p^iy) = p{p{v)) be the 
grandparent, etc. Finally, let p^{v), say, be the root, k depending implicitly on v. Node p^{v) = v 
belongs to path{v) with probability 1; v belongs to the path{p^{v)) with probability 1/ d{p^{v)); 
it belongs to path{p'^{v)) with probability l/{d{p^{v))d{p'^{v))), etc. It belongs to path{u) with 
probability if u is not an ancestor of v. Thus 
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we get a telescoping sum 
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We have proven that the algorithm finds a 2 approximation. 



□ 



B NP-hardness of Tree x Tree 

In this section we sketch a proof that Tree x Tree is NP-hard. We show that the problem is NP- 

hard even if each of the trees is a star. We construct a reduction from the Directed Hamiltonian 
Path problem. Let G = {V^E) be a directed graph. Fix a parameter M = (10max{|y|, 
For every vertex u, we define M rows of our matrix, which we denote -Ri(w), . . . , Rm{v)- For every 
directed edge {u,v), we define M columns of our matrix, which we denote C\{uv), . . . ^Cm{uv). 
Thus our matrix has dimensions (M ■ \V\) x (M • \E\). The trees are stars, thus allowed rectangles 
arc the whole matrix, individual rows, individual columns and individual cells. In our example the 
gap between the values of "yes" and "no" instances will be larger than the number of rows plus the 
number of columns. Thus, we may assume that rectangles corresponding to columns and rows are 
free to use. In this case, we may also assume that the weight of the rectangle covering the whole 
matrix is (instead of having this rectangle with weight w in the solution we may just increase 
the value of all columns by w). Denote by Xi{z) the variable for the rectangle corresponding to 
row Ri{z) (possibly 0); denote by yj{uv) the variable for the rectangle corresponding to column 
Cjiuv); denote the entry of the matrix at the intersection of the row Ri{z) and the column Cj{uv) 
by aij{z,uv). Then the cost of the solution equals the number of individual cells with nonzero 
weight, i.e., the number of unsatisfied equations 

Xi{z) +yj{uv) = aij{z,uv). 
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Thus the problem is to find values of variables Xi{z) and yj{uv) so as to minimize the number of 
unsatisfied equations. Remember, however, that we need to guarantee a gap of at least M • |y| + 
M • |£'| between the values of "yes" and "no" instances. 

We set aij{u,uv) = for every vertex u and every edge {u,v). We set aij{v,uv) = 1 for every 
vertex v and every edge {u,v). We call the rest of the matrix entries, i.e., entries aij{z,uv), where 
z ^ u and z ^ v, "bad entries." Let us pretend for a while that there are no bad entries and that 
there are no equations corresponding to bad entries. (Later we will set aij{z,uv) = ij.) 

We claim that if the graph has a directed Hamiltonian path then there exists a solution with 
at most {\E\ — \V\ + 1) • unsatisfied equations. Let pos{u) be the position of the vertex in the 
Hamiltonian path: 1st, 2nd, 3rd, etc. Then we set Xi{u) = pos{u) and yj{uv) = —pos{u). Observe 
that if an edge (n, v) belongs to the Hamiltonian path, then 

Xi{u) +yj{uv) = pos{u) — pos{u) = = aij{u,uv) 

and 

Xi{v) + yj{uv) = {pos{u) + 1) — pos{u) = 1 = aij{v,uv). 
If an edge (n, v) does not belong to the Hamiltonian path, then still 

Xi{u) +yj{uv) = pos{u) — pos{u) = = aij{u,uv), 

but 

Xi{v) + yj{uv) = pos{v) — pos{u) / 1 = aij{v,uv). 

The number of unsatisfied equations thus equals • {\E\ — \ V\ + 1). 

Now we show that if the graph does not have a directed Hamiltonian path, then every solution 
has cost at least • {\E\ — \V\ + 2). Assume to the contrary, that there exists a solution of cost 
less than • (\E\ — \V\ + 2). Since all variables Xi{u) for a fixed u and i = 1, . . . , M participate 
in exactly the same equations we may assume that Xi{u) = Xj{u) for all i and j in the optimal 
solution. Similarly, we may assume that yi{uv) = yj{uv) for all i and j. (Recall that we now 
ignore all bad equations.) If Xi{z) + yj{uv) = aij{z,uv) (z = u or z = v), then the same equality 
holds for every i and j. Thus, the number of unsatisfied equations is at most Af^ • (\E\ — \ V\ + 1) 
(since the number of unsatisfied equations is divisible by M^). Consider an edge {u,v) for which 
Xi{u) + yj{uv) = and Xi{v) + yj{uv) = 1. We have Xi{v) — Xi{u) = 1. The number of such edges 
is at least |y| — 1 (since the number of edges for which Xi{u) + yj{uv) 7^ or Xi{v) + yj{uv) 7^ 1 is 
at most the total number of unsatisfied equations divided by M^, i.e., — \V\ + 1, and the total 
number of edges is \E\). Therefore, if we place vertex u at position Xi{u) + (1 — min^ Xj(s)) (recall 
that Xi{u) does not depend on i) we get a Hamiltonian path. 

We are almost done. We only need to take care of bad equations. The idea is to set the rest 
of values aij{z,uv) so that only very few bad equations can be satisfied. For each z and every 
edge {u,v) we define an M x M matrix aij{z,uv) = ij. We claim that in every matrix ajj(-, •), the 
number of satisfied equations is at most 3M^/^. We prove the claim in Lemma [TTl Then for every 
assignment of variables Xi{z) and yj{uv), the total number of satisfied bad equations is at most 
lE'l • \V\ • ^ M"^/!. Hence, the gap between "yes" and "no" instances is at least M'^/2. 

Lemma 10. Consider an M x M matrix aij of zeros and ones. Suppose that for every ii, i2, ji 
and 22 (ii 7^ «2 O'nd ji / j2) at most three out of four of values ai^j^, ai^j^, ai^ji, Oi2j2 equal 1. 
Then the number of ones in the matrix is at most 3M^/-^ . 
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Proof. Perform the following algorithm: While there exists a column containing at least \/M ones, 
pick one such column j. Remove all rows i of the at-least-\/M rows that have 1 at the intersection 
with column j. 

When the algorithm stops, the remaining matrix has at most M^/^ ones. Let Rt > \/M be the 
number of rows removed at step t. At every step t, the algorithm removes MRt entries, among 
which there are at most Rt + (M — 1) ones {Rt ones in the selected column and at most one in 
each of the remaining M — 1 columns, by hypothesis). Hence, the fraction of removed ones among 
all removed entries is at most {Rt + M)/{MRt) = 1/M + 1/ Rt- Thus the total number of removed 
ones is at most M^(l/M + 1/Rt) < M + M^/"^. We get that the total number of ones present in 
the original matrix is at most M + M^/^ plus the at-most-M^/^ ones in the resulting matrix, or at 
most M + □ 

Lemma 11. Consider a system of linear equations 

Xi + yj = ij. 

For all possible Xj and yj the number of satisfied equations is at most 3M^/'^. 

Proof. Observe that for every zi, Z2, ji and ji {ii / 12 and ji 7^ ^'2), it is not possible to satisfy all 
four equations: Xi^ + yj^ = iiji, xi^ + yj^ = ii32, + yj^ = i2ji, and + yj^ = Z2j2, since if ah 
four of them are satisfied then 

kji + i232 = + + Xi^ + yj^ = zij2 + «2ji, 

but iiji +^2^2 7^ ii32+i'2ji (since ii(j2 — ji) / ^2(^2 — 3i))- Lemma [TOl now implies that the number 
of satisfied equations is at most 3M^/^ . □ 

C A Running Time Comparison Between The Present Algorithms 
And Natarajan's 

Of course it is not fair to compare our algorithms, which approximately solve the exact problems, 
with Natarajan's, which approximately solves the inexact L2 problem. Of course the optimal value 
for our problem, being exact, is at least as large as the optimal value for Natarajan's problem. 
While Natarajan's algorithm is very general, the price paid is that it's slow. 

For problem TreexTree, our algorithm takes time O(dn^) in total, which is 0{d) times the 
input size of , where d < n is the smaller of the depths of the two trees; typically one expects d 
to be O(logn) (or constant) in applications. Natarajan's algorithm takes time O(n^) even for each 
iteration. 

For problem AllRects, the contrast between the running times of our algorithm and Natara- 
jan's is even more stark. Our simplified 8/3-approximation algorithm runs in time 0{n^ + OPT ■ 
(nlogn)) (where the input size is n^) with OPT < n^, whereas Natarajan's takes time ^l{n^) per 
iteration. This makes Natarajan's algorithm wildly impractical for the large instances which often 
occur in database applications. 
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